Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terraform init and lock file platform #28041

Closed
ademariag opened this issue Mar 10, 2021 · 19 comments
Closed

terraform init and lock file platform #28041

ademariag opened this issue Mar 10, 2021 · 19 comments
Labels
enhancement new new issue not yet triaged

Comments

@ademariag
Copy link

Current Terraform Version

Terraform Version

terraform --version
Terraform v0.14.2

Use-cases

Since terraform v0.14, terraform init will generate a lock file which will only contain details about providers for the current platform.

This means that, when running the same code from multiple platform, the terraform init command needs to be followed by
terraform.sh providers lock -platform=linux_amd64 -platform=darwin_amd64

If not, when running on a different platform the following error will appear:

Error: Failed to install provider from shared cache
Error while importing hashicorp/random v3.1.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/random 3.1.0 that doesn't match any of the
checksums recorded in the dependency lock file.

Proposal

Option A: Declare desired platforms in terraform code, and have init respect it

It would be nice if the code was to dictate which platform are expected to be initialised, so that the terraform init command would make sure all platform downloads are satisfied.

Option B: Add -platform= flags to terraform init

Make terraform init recognise the platform flags, removing the need to run terraform.sh providers lock

@ademariag
Copy link
Author

In Terraform's default configuration, terraform init will generate a lock file containing all of the checksums signed by the original provider author across all platforms. However, it's true that currently if you override that behavior to make Terraform install from somewhere other than the origin registry (either via explicit installation method configuration or by enabling the read-through cache) then terraform init becomes constrained by the limited metadata available on those alternate installation methods, in which case you need to use terraform providers lock to achieve a result like terraform init would produce by default.

Originally posted by @apparentlymart in #27241 (comment)

@apparentlymart this is not what I observe (unless I misunderstand/misinterpret what you said).
When I run terraform init, only the current platform checksum are recorded, leading to the error:

Error: Failed to install provider from shared cache
Error while importing hashicorp/random v3.1.0 from the shared cache directory:
the provider cache at .terraform/providers has a copy of
registry.terraform.io/hashicorp/random 3.1.0 that doesn't match any of the
checksums recorded in the dependency lock file.

@apparentlymart
Copy link
Member

apparentlymart commented Mar 18, 2021

The "shared cache" mentioned in this error message is the "read-through cache" I was talking about in my comment there, so I guess you have that enabled and thus what I wrote in that comment doesn't apply.

That's a situation where currently we typically need to run terraform providers lock explicitly after each change to the selected providers, to make sure that the lock file includes all of the checksums recorded in the registry even though terraform init has been configured not to access the registry.

@bpoland
Copy link

bpoland commented Apr 9, 2021

We're also hitting this issue after upgrading to terraform 0.14 -- any time we update providers, we have to make sure to re-run terraform providers lock -platform=darwin_amd64 -platform=linux_amd64 to avoid cross platform issues.

It would be great if terraform automatically populated checksums for all available platforms with the "read-through cache" (or would that mean having to download the provider for all platforms too?). Option A from the original post would be fine too, or option B as long as it remembers the platforms that were initially selected and populates them automatically when updating a provider version.

@dee-kryvenko
Copy link

dee-kryvenko commented Jul 16, 2021

Yeah I just confirmed what @apparentlymart said in #27241 (comment) is not true...

Here's what I did. I started with a simple main.tf:

resource "null_resource" "test" {}

I have my local laptop and a linux container both having 0.14.0 installed.

  1. Locally: tf init - it generated .terraform.lock.hcl
  2. Locally: tf providers lock - there was no changes to the .terraform.lock.hcl
  3. Locally: removed .terraform folder and did export TF_PLUGIN_CACHE_DIR=$(cd .. && pwd)/tf-cache
  4. Locally: tf init works - no changes to .terraform.lock.hcl
  5. Locally: removed .terraform and .terraform.lock.hcl leaving TF_PLUGIN_CACHE_DIR set
  6. Locally: tf init works - but generated different .terraform.lock.hcl - all the entries except one are gone!
  7. Locally: tf providers lock brought all the entries back (spoiler - these are all darwin_amd64 entries, they are zh and h1 hash types which my understanding is for different tf versions which explained here https://www.terraform.io/docs/language/dependency-lock.html#new-provider-package-checksums)
  8. So clearly weather TF_PLUGIN_CACHE_DIR is set or not AND weather it is empty or not makes a difference in how tf init works but not tf providers lock
  9. Container: removed .terraform and set export TF_PLUGIN_CACHE_DIR=$(cd .. && pwd)/tf-cache-container
  10. Container: tf init works - noticed it added new entry to .terraform.lock.hcl, starts ringing a bell...
  11. Container: reverted that .terraform.lock.hcl change from 10) but left TF_PLUGIN_CACHE_DIR intact - first run it was set but empty, now it's populated with cache after the previous step
  12. Container: run tf init again - boom! 💥
  13. Locally: removed .terraform and did tf providers lock -platform=linux_amd64 -platform=darwin_amd64 - it added the same hash that step 10) added before
  14. Container: tf init now works in the container with pre-populated cache
  15. Container: tf providers lock in container changed nothing

TL;DR: So there's at least two layers to this issue. tf init behaves differently but I actually kind of like it, though it could be more explicit and transparent - but the fact it is not reaching out to remote HC registry and only uses cache location if set - kind of makes sense. Maybe the error message could be more clear on that suggesting users to run tf providers lock. But the other layer of this issue is that tf providers lock is not cross platform. It will produce different results and different lock on different platform. It feels unintentional and probably is a bug - at least accordingly to @apparentlymart, but complexity that it takes to reproduce it and not fall short by the first layer of this issue was probably preventing @apparentlymart to see there is an actual problem. Currently the only way to make my tf code cross-platform in the case I'm using cache - is to run tf providers lock -platform=linux_amd64 -platform=darwin_amd64 on my laptop before committing the code.

To further add context... both tf init and tf providers lock are generating the following lock on darwin:

# This file is maintained automatically by "terraform init".
# Manual edits may be lost in future updates.

provider "registry.terraform.io/hashicorp/null" {
  version = "3.1.0"
  hashes = [
    "h1:xhbHC6in3nQryvTQBWKxebi3inG5OCgHgc4fRxL0ymc=",
    "zh:02a1675fd8de126a00460942aaae242e65ca3380b5bb192e8773ef3da9073fd2",
    "zh:53e30545ff8926a8e30ad30648991ca8b93b6fa496272cd23b26763c8ee84515",
    "zh:5f9200bf708913621d0f6514179d89700e9aa3097c77dac730e8ba6e5901d521",
    "zh:9ebf4d9704faba06b3ec7242c773c0fbfe12d62db7d00356d4f55385fc69bfb2",
    "zh:a6576c81adc70326e4e1c999c04ad9ca37113a6e925aefab4765e5a5198efa7e",
    "zh:a8a42d13346347aff6c63a37cda9b2c6aa5cc384a55b2fe6d6adfa390e609c53",
    "zh:c797744d08a5307d50210e0454f91ca4d1c7621c68740441cf4579390452321d",
    "zh:cecb6a304046df34c11229f20a80b24b1603960b794d68361a67c5efe58e62b8",
    "zh:e1371aa1e502000d9974cfaff5be4cfa02f47b17400005a16f14d2ef30dc2a70",
    "zh:fc39cc1fe71234a0b0369d5c5c7f876c71b956d23d7d6f518289737a001ba69b",
    "zh:fea4227271ebf7d9e2b61b89ce2328c7262acd9fd190e1fd6d15a591abfa848e",
  ]
}

Note there's only one h1 entry. If I run it on linux - tf will add a second h1 entry. So that clearly illustrates what's the issue here. If my linux had TF_PLUGIN_CACHE_DIR set and cache already had this plugin - it will fail with that error message on the top.

@dee-kryvenko
Copy link

dee-kryvenko commented Jul 16, 2021

Further down - this actually also defeats the whole purpose of the lock. Even if I am not using cache - I am thinking all is fine and I am secured but in fact my TF in CI every time getting hash from remote and I don't even know about that. Meaning if remote is compromised - the lock file would not protect me even though I though it would. Now this smells like a security issue... I'm gonna ping security@hashicorp.com

@ghost
Copy link

ghost commented Jul 19, 2021

Further down - this actually also defeats the whole purpose of the lock.

Exactly. Thanks @dee-kryvenko for this comprehensive list of actions to show this issue. It is really annoying especially for collegues who use terraform only. Right now our solution is to "whover wants to run tf apply deletes the lock and runs tf init", ... I'd appreciate a usable, cross-platform, cache enabled solution for this lock file so much! (Thanks for all the work though!)

@notwedtm
Copy link

This seems like a pretty big issue. Is there a timeline or ETA on a resolution?

@dee-kryvenko
Copy link

Yeah. Meantime I haven't heard back from security@hashicorp.com even though the auto reply said they ought to get back to me within 72 hours. Doesn't seem like they think it's a big deal.

@yurymuski
Copy link

Any updates on this?

@LDVSOFT
Copy link

LDVSOFT commented Sep 2, 2021

This really kills the usage if a user (like me) has several Terraform configurations, and after any provider update any configuration after the first one fails on init because there is already a matching provider in the local cache but no way to verify it. There are two solutions that I see:

  • Enforce signing h1 hashes so that all h1 hashes are saved in the lock file after init on any machine,
  • If Terraform sees a provider in the local cache and no h1 hash to verify it: go and download one, verify that is actually matches, save the hash to the lock file, don't overwrite file in cache (I guess this is approximately what terraform providers lock does).

@alanyee
Copy link

alanyee commented Sep 9, 2021

To be exhaustive and for the sake of clarity, this issue affects the current version which is 1.0.6

@enricojonas
Copy link

Facing the same issue, would be really helpful to either populate all hashes for all platforms by default or have a target platform configuration in the code somewhere so we can define on which platforms we are running.

@jfirebaugh
Copy link

In Terraform's default configuration, terraform init will generate a lock file containing all of the checksums signed by the original provider author across all platforms.

This just doesn't seem to be true, regardless of whether TF_PLUGIN_CACHE_DIR is set or not.

I'm on darwin_amd64. I don't have TF_PLUGIN_CACHE_DIR set. If I start with no .terraform/ and no .terraform.lock.hcl, terraform init generates a lockfile with one h1 entry for each provider.

After I run terraform providers lock -platform=linux_amd64 -platform=darwin_amd64, there are two h1 entries for each provider.

@apparentlymart
Copy link
Member

If you are seeing terraform init not populate the full set of zh: locks listed in the origin registry for a particular provider even though you are not using a global cache directory and you are not using any local mirrors then please open a new issue for that, with some information on how we can reproduce it.

The current design intent is that if you have no provider_installation or provider_cache_dir blocks in your CLI configuration, and no environment variables and implied local mirror directories that Terraform would treat the same as setting those, then terraform init should save a full set of hashes that are acceptable for direct registry installation across all platforms that the provider officially supports.

Let's keep this issue about the unfortunate interactions between the lock file and the alternate (non-default) installation options, because fixing a bug in the implementation of the current design is something different (and hopefully much less involved) than improving the design to better support the non-default installation configurations.

@prein
Copy link

prein commented Nov 3, 2021

without diving too deep in the thread, is there a recommended workaround?

@spuder
Copy link
Contributor

spuder commented Nov 10, 2021

Possible Workaround is to generate a lock for all platforms

terraform providers lock \
-platform=windows_amd64 \
-platform=darwin_amd64 \
-platform=linux_amd64 \
-platform=darwin_arm64 \
-platform=linux_arm64

Update
Add darwin_arm64 to support Apple M1 processors.

@oatmealb
Copy link

oatmealb commented Jul 12, 2022

The proposed workaround in the previous comment does not work for me. Here's my repro sample. Having checksums for darwin_arm64 platform, I cannot add checksums for other platforms, f.e. darwin_amd64 and linux_amd64. Am I missing something?

This workaround of disabling plugin_cache_dir works, but this then unbearably slows down local work/iteration.

Also, I pulled the above-linked repro sample on a Linux amd64 platform, where no plugin_cache_dir was (ever) set, and still had to first rm .terraform.lock.hcl && terraform init and only then would terraform providers lock -platform=darwin_amd64 -platform=darwin_arm64 -platform=linux_amd64 succeed. But surely, discarding .terraform.lock.hcl say on every CI run isn't an acceptable workflow? If I didn't discard the lock, I got the familiar error:

terraform providers lock -platform=darwin_amd64 -platform=darwin_arm64 -platform=linux_amd64
- Fetching hashicorp/aws 4.22.0 for darwin_amd64...
- Obtained hashicorp/aws checksums for darwin_amd64 (signed by HashiCorp)
- Fetching hashicorp/tls 3.4.0 for darwin_amd64...
- Fetching hashicorp/kubernetes 2.12.1 for darwin_amd64...
- Obtained hashicorp/kubernetes checksums for darwin_amd64 (signed by HashiCorp)
- Fetching hashicorp/cloudinit 2.2.0 for darwin_amd64...
╷
│ Error: Could not retrieve providers for locking
│
│ Terraform failed to fetch the requested providers for darwin_amd64 in order to calculate their checksums: some providers could not be installed:
│ - registry.terraform.io/hashicorp/cloudinit: the current package for registry.terraform.io/hashicorp/cloudinit 2.2.0 doesn't match any of the checksums previously recorded in the dependency lock
│ file
│ - registry.terraform.io/hashicorp/tls: the current package for registry.terraform.io/hashicorp/tls 3.4.0 doesn't match any of the checksums previously recorded in the dependency lock file.

Linux env:

> uname -a
Linux a_hostname 5.13.0-1031-aws #35~20.04.1-Ubuntu SMP Mon Jun 13 22:30:30 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

jhlwscom added a commit to jhlwscom/azure-sandbox-subscription that referenced this issue Jul 20, 2022
@twbecker
Copy link

I hate to post a "me too" comment, but having lost the better part of the day to this issue I feel like I have to. From what I can see, there appears to be literally no way to generate lock files that are compatible with other platforms while retaining the advantages of the plugin cache: namely, to avoid having many copies of each provider and having to download them frequently. Additionally, it is not at all obvious from the error messages that the use of the plugin cache is what breaks terraform providers lock. The only workaround I see is deleting the lock file altogether which would seem to nullify the benefit of having it in the first place.

@apparentlymart
Copy link
Member

We've made a change in the main branch, for the forthcoming Terraform v1.4 release, which changes the interaction between the dependency lock file and the global cache directory in order to avoid the problem described in this issue.

The previous workaround was to run terraform providers lock (with suitable platform arguments) immediately after running terraform init in order to fill out additional checksums for each provider on other platforms.

In Terraform v1.4 something similar to this behavior will now happen automatically: when populating the lock file entry for the first time, terraform init will access the origin registry to obtain the full set of checksums for that provider and write them all into the lock file just as if the plugin cache directory were not present at all. On future runs, with the lock file already populated, Terraform will then use the plugin cache directory as long as its contents match a checksum in the lock file.

This should therefore avoid the problem where running terraform init on a system which has a warm cache will only include the checksum of the package that was already in the cache. It also separately provides an additional guarantee that wasn't possible before: if your cache directory contains a package that doesn't match the upstream signed checksums then Terraform will no longer implicitly trust the local modified copy, and will instead repair the cache to match the official package for your current platform.


This new behavior is available for testing in Terraform v1.4.0-alpha20221109. We don't recommend using alpha releases in production, but since this behavior only affects terraform init there are some procedures to test it without interacting with your production infrastructure, including the following:

  1. Create a temporary branch of the repository containing one of your Terraform configurations and then perform the remaining steps only within that branch.
  2. Have one team member run terraform init and commit any changes to .terraform.lock.hcl to your version control and push it only to the temporary branch.
  3. Have another team member check out that branch and run terraform init again. They should hopefully see Terraform successfully install and verify the same provider versions selected in the previous step.
  4. Both participants should be able to run terraform init again at this point and see Terraform install all plugins from the global cache directory instead of from the origin registry, because both will have an up-to-date dependency lock file.

If you try this and have feedback, please let us know by opening a new GitHub issue in this repository. We'd prefer to have a separate issue for each area of feedback because GitHub issues are not suitable for many concurrent discussions in the same issue. Thanks!

@hashicorp hashicorp locked as resolved and limited conversation to collaborators Nov 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement new new issue not yet triaged
Projects
None yet
Development

No branches or pull requests